Main Protocol
(1) Construct a co-expression network
This section heavily represents the content from the WGCNA tutorials created by the author’s of WGCNA, found here: https://horvath.genetics.ucla.edu/html/CoexpressionNetwork/Rpackages/WGCNA/Tutorials/
## Setting WGCNA options ##
options(stringsAsFactors = FALSE)
## Quality Control of Expression Data ##
#Using a cluster tree to find sample outliers
sampleTree = hclust(dist(norm_exp_mat), method = "average")
sizeGrWindow(12,9)
par(cex = 0.6);
par(mar = c(0,4,2,0))
plot(sampleTree, main = "Sample clustering to detect outliers", sub="", xlab="", cex.lab = 1.5,
cex.axis = 1.5, cex.main = 2)
## Power Calculation for Network Construction ##
powers = c(c(1:10), seq(from = 12, to=20, by=2))
# Call the network topology analysis function
sft = pickSoftThreshold(norm_exp_mat, powerVector = powers, verbose = 5, networkType = "signed")
pickSoftThreshold: will use block size 1529.
pickSoftThreshold: calculating connectivity for given powers...
..working on genes 1 through 1529 of 29255
..working on genes 1530 through 3058 of 29255
..working on genes 3059 through 4587 of 29255
..working on genes 4588 through 6116 of 29255
..working on genes 6117 through 7645 of 29255
..working on genes 7646 through 9174 of 29255
..working on genes 9175 through 10703 of 29255
..working on genes 10704 through 12232 of 29255
..working on genes 12233 through 13761 of 29255
..working on genes 13762 through 15290 of 29255
..working on genes 15291 through 16819 of 29255
..working on genes 16820 through 18348 of 29255
..working on genes 18349 through 19877 of 29255
..working on genes 19878 through 21406 of 29255
..working on genes 21407 through 22935 of 29255
..working on genes 22936 through 24464 of 29255
..working on genes 24465 through 25993 of 29255
..working on genes 25994 through 27522 of 29255
..working on genes 27523 through 29051 of 29255
..working on genes 29052 through 29255 of 29255
Power SFT.R.sq slope
1 1 0.0985 -74.30
2 2 0.3610 -59.50
3 3 0.6250 -41.80
4 4 0.6770 -27.10
5 5 0.7300 -19.30
6 6 0.7800 -14.40
7 7 0.8490 -10.60
8 8 0.9160 -8.79
9 9 0.9600 -7.29
10 10 0.9810 -6.03
11 12 0.9920 -4.38
12 14 0.9910 -3.37
13 16 0.9900 -2.75
14 18 0.9770 -2.36
15 20 0.9710 -2.08
truncated.R.sq mean.k. median.k.
1 0.978 14600.00 14600.00
2 0.873 7540.00 7530.00
3 0.890 3990.00 3980.00
4 0.901 2170.00 2160.00
5 0.925 1210.00 1200.00
6 0.945 687.00 678.00
7 0.938 399.00 391.00
8 0.954 237.00 230.00
9 0.968 143.00 138.00
10 0.977 88.20 83.60
11 0.994 35.30 32.30
12 0.994 15.20 13.10
13 0.989 7.06 5.61
14 0.970 3.53 2.50
15 0.964 1.91 1.16
max.k.
1 14900.0
2 7900.0
3 4390.0
4 2550.0
5 1540.0
6 967.0
7 632.0
8 448.0
9 333.0
10 258.0
11 174.0
12 131.0
13 105.0
14 87.2
15 74.1
# Plot the results:
sizeGrWindow(9, 5)
par(mfrow = c(1,2));
cex1 = 0.9;
# Scale-free topology fit index as a function of the soft-thresholding power
plot(sft$fitIndices[,1], -sign(sft$fitIndices[,3])*sft$fitIndices[,2],
xlab="Soft Threshold (power)",ylab="Scale Free Topology Model Fit,signed R^2",type="n",
main = paste("Scale independence")); text(sft$fitIndices[,1], -sign(sft$fitIndices[,3])*sft$fitIndices[,2],
labels=powers,cex=cex1,col="red");abline(h=0.9,col="red")
# this line corresponds to using an R^2 cut-off of h
# Mean connectivity as a function of the soft-thresholding power
plot(sft$fitIndices[,1], sft$fitIndices[,5],
xlab="Soft Threshold (power)",ylab="Mean Connectivity", type="n",
main = paste("Mean connectivity"));text(sft$fitIndices[,1], sft$fitIndices[,5], labels=powers, cex=cex1,col="red")
#### power = 8
#### lowest power to provide a scale-free topology fit of at least 0.9
#### Low mean connectivity score
net = blockwiseModules(norm_exp_mat, power = 8,
TOMType = "signed", minModuleSize = 20,
networkType = "signed",
reassignThreshold = 0, mergeCutHeight = 0.15,
numericLabels = TRUE, pamRespectsDendro = FALSE,
saveTOMs = TRUE,
saveTOMFileBase = "norm_exp_mat_TOM",
verbose = 3)
Calculating module eigengenes block-wise from all genes
Flagging genes and samples with too many missing values...
..step 1
....pre-clustering genes to determine blocks..
Projective K-means:
..k-means clustering..
..merging smaller clusters...
Block sizes:
gBlocks
1 2 3 4 5 6
4996 4980 4964 4880 4739 4696
..Working on block 1 .
TOM calculation: adjacency..
..will not use multithreading.
Fraction of slow calculations: 0.000000
..connectivity..
..matrix multiplication (system BLAS)..
..normalization..
..done.
..saving TOM for block 1 into file norm_exp_mat_TOM-block.1.RData
....clustering..
....detecting modules..
....calculating module eigengenes..
....checking kME in modules..
..removing 280 genes from module 1 because their KME is too low.
..removing 318 genes from module 2 because their KME is too low.
..removing 77 genes from module 3 because their KME is too low.
..removing 499 genes from module 4 because their KME is too low.
..removing 168 genes from module 5 because their KME is too low.
..removing 121 genes from module 6 because their KME is too low.
..removing 38 genes from module 7 because their KME is too low.
..removing 2 genes from module 8 because their KME is too low.
..removing 2 genes from module 9 because their KME is too low.
..Working on block 2 .
TOM calculation: adjacency..
..will not use multithreading.
Fraction of slow calculations: 0.000000
..connectivity..
..matrix multiplication (system BLAS)..
..normalization..
..done.
..saving TOM for block 2 into file norm_exp_mat_TOM-block.2.RData
....clustering..
....detecting modules..
....calculating module eigengenes..
....checking kME in modules..
..removing 530 genes from module 1 because their KME is too low.
..removing 654 genes from module 2 because their KME is too low.
..removing 96 genes from module 3 because their KME is too low.
..removing 188 genes from module 4 because their KME is too low.
..removing 21 genes from module 5 because their KME is too low.
..Working on block 3 .
TOM calculation: adjacency..
..will not use multithreading.
Fraction of slow calculations: 0.000000
..connectivity..
..matrix multiplication (system BLAS)..
..normalization..
..done.
..saving TOM for block 3 into file norm_exp_mat_TOM-block.3.RData
....clustering..
....detecting modules..
....calculating module eigengenes..
....checking kME in modules..
..removing 1156 genes from module 1 because their KME is too low.
..removing 1071 genes from module 2 because their KME is too low.
..Working on block 4 .
TOM calculation: adjacency..
..will not use multithreading.
Fraction of slow calculations: 0.000000
..connectivity..
..matrix multiplication (system BLAS)..
..normalization..
..done.
..saving TOM for block 4 into file norm_exp_mat_TOM-block.4.RData
....clustering..
....detecting modules..
....calculating module eigengenes..
....checking kME in modules..
..removing 562 genes from module 1 because their KME is too low.
..removing 360 genes from module 2 because their KME is too low.
..removing 264 genes from module 3 because their KME is too low.
..removing 215 genes from module 4 because their KME is too low.
..removing 151 genes from module 5 because their KME is too low.
..removing 79 genes from module 6 because their KME is too low.
..removing 22 genes from module 7 because their KME is too low.
..removing 6 genes from module 8 because their KME is too low.
..removing 3 genes from module 9 because their KME is too low.
..Working on block 5 .
TOM calculation: adjacency..
..will not use multithreading.
Fraction of slow calculations: 0.000000
..connectivity..
..matrix multiplication (system BLAS)..
..normalization..
..done.
..saving TOM for block 5 into file norm_exp_mat_TOM-block.5.RData
....clustering..
....detecting modules..
....calculating module eigengenes..
....checking kME in modules..
..removing 657 genes from module 1 because their KME is too low.
..removing 269 genes from module 2 because their KME is too low.
..removing 119 genes from module 3 because their KME is too low.
..removing 130 genes from module 4 because their KME is too low.
..removing 126 genes from module 5 because their KME is too low.
..removing 64 genes from module 6 because their KME is too low.
..removing 53 genes from module 7 because their KME is too low.
..removing 36 genes from module 8 because their KME is too low.
..removing 18 genes from module 9 because their KME is too low.
..removing 33 genes from module 10 because their KME is too low.
..removing 35 genes from module 11 because their KME is too low.
..removing 40 genes from module 12 because their KME is too low.
..removing 13 genes from module 13 because their KME is too low.
..removing 26 genes from module 14 because their KME is too low.
..removing 6 genes from module 15 because their KME is too low.
..removing 5 genes from module 16 because their KME is too low.
..removing 5 genes from module 17 because their KME is too low.
..removing 3 genes from module 18 because their KME is too low.
..removing 6 genes from module 19 because their KME is too low.
..removing 3 genes from module 20 because their KME is too low.
..removing 2 genes from module 21 because their KME is too low.
..Working on block 6 .
TOM calculation: adjacency..
..will not use multithreading.
Fraction of slow calculations: 0.000000
..connectivity..
..matrix multiplication (system BLAS)..
..normalization..
..done.
..saving TOM for block 6 into file norm_exp_mat_TOM-block.6.RData
....clustering..
....detecting modules..
....calculating module eigengenes..
....checking kME in modules..
..removing 1087 genes from module 1 because their KME is too low.
..removing 681 genes from module 2 because their KME is too low.
..removing 158 genes from module 3 because their KME is too low.
..removing 126 genes from module 4 because their KME is too low.
..removing 58 genes from module 5 because their KME is too low.
..removing 10 genes from module 6 because their KME is too low.
..removing 7 genes from module 7 because their KME is too low.
..merging modules that are too close..
mergeCloseModules: Merging modules whose distance is less than 0.15
Calculating new MEs...
## Saving the Network ##
moduleLabels = net$colors
moduleColors = labels2colors(net$colors)
MEs = net$MEs;
dim(MEs)
[1] 42 59
geneTree = net$dendrograms[[1]];
save(sft, MEs, moduleLabels, moduleColors, geneTree,
file = "./network.RData")
## Loading the Network ##
load("~/Desktop/STAR_Protocols_Paper/data/network/network.RData")
(2) Gene Ontology Analysis
(3) Creating GWAS gene list
# We now have a list of human genes that overlap the regions as defined by LD from our study of interest, however we may also need the equivalent list in mouse
# We can use a homology map to generate a list of mouse homologs for the human gene list from MGI (http://www.informatics.jax.org/faq/ORTH_dload.shtml)
homology <- read_tsv("./data/ref/HOM_MouseHumanSequence.txt")
Parsed with column specification:
cols(
`HomoloGene ID` = [32mcol_double()[39m,
`Common Organism Name` = [31mcol_character()[39m,
`NCBI Taxon ID` = [32mcol_double()[39m,
Symbol = [31mcol_character()[39m,
`EntrezGene ID` = [32mcol_double()[39m,
`Mouse MGI ID` = [31mcol_character()[39m,
`HGNC ID` = [31mcol_character()[39m,
`OMIM Gene ID` = [31mcol_character()[39m,
`Genetic Location` = [31mcol_character()[39m,
`Genomic Coordinates (mouse: , human: )` = [31mcol_character()[39m,
`Nucleotide RefSeq IDs` = [31mcol_character()[39m,
`Protein RefSeq IDs` = [31mcol_character()[39m,
`SWISS_PROT IDs` = [31mcol_character()[39m
)
homology
gwas_gene_hom <- homology %>%
filter(Symbol %in% gwas_region_genes$gene_name)
gwas_mouse_hom <- homology %>%
filter(`HomoloGene ID` %in% gwas_gene_hom$`HomoloGene ID`) %>%
filter(`NCBI Taxon ID` == 10090)
gwas_mouse_genes = gwas_mouse_hom$Symbol